MTAC - A Multithreaded VLIW Architecture for PRAM Simulation
نویسنده
چکیده
The high latency of memory operations is a problem in both sequential and parallel computing. Multithreading is a technique, which can be used to eliminate the delays caused by the high latency. This happens by letting a processor to execute other processes (threads) while one process is waiting for the completion of a memory operation. In this paper we investigate the implementation of multithreading in the processor-level. As a result we outline and evaluate a MultiThreaded VLIW processor Architecture with functional unit Chaining (MTAC), which is specially designed for PRAM-style parallelism. According to our experiments MTAC offers remarkably better performance than a basic pipelined RISC architecture and chaining improves the exploitation of instruction level parallelism to a level where the achieved speedup corresponds to the number of functional units in a processor.
منابع مشابه
Code Generation and Global Optimization Techniques for a Reconfigurable PRAM-NUMA Multicore Architecture
In this thesis we describe techniques for code generation and global optimization for a PRAM-NUMA multicore architecture. We specifically focus on the REPLICA architecture which is a family massively multithreaded very long instruction word (VLIW) chip multiprocessors with chained functional units that has a reconfigurable emulated shared on-chip memory. The on-ship memory system supports two e...
متن کاملSimultaneous Multithreaded DSPs: Scaling from High Performance to Low Power
In the DSP world, many media workloads have to perform a specific amount of work in a specific period of time. This observation led us to examine how we can exploit Simultaneous Multithreading for VLIW DSP architectures to: 1) increase throughput in situations where performance is the most important attribute (e.g., base station workloads) and 2) decrease power consumption in situations where p...
متن کاملPerformance Evaluation of a Non-Blocking Multithreaded Architecture for Embedded, Real-Time and DSP Applications
This paper presents the evaluation of a non-blocking, decoupled memory/execution, multithreaded architecture known as the Scheduled Dataflow (SDF). The major recent trend in digital signal processor (DSP) architecture is to use complex organizations to exploit instruction level parallelism (ILP). The two most common approaches for exploiting the ILP are Superscalars and Very Long Instruction Wo...
متن کاملSuperscalar Performance in a Multithreaded Microprocessor
Multithreaded processors, having hardware support for the concurrent execution of fine-grained threaded computations, are noted for their latency tolerance and low-cost synchronization. Multithreading is a technique for improving the utilization of processing elements (PEs) in parallel processing systems, thereby reducing cost/performance ratios. With increasing integrated circuit densities it ...
متن کاملShared memory with hidden latency on a family of mesh-like networks
In this thesis we consider the general problem of how to provide a shared memory model on a network of processors where memory is physically distributed among the processors. In particular, we consider the simulation of an EREW PRAM model on a family of mesh and ring like networks, and we are interested in latency hiding simulations. Our goal is first to provide a simulation which has delay pro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. UCS
دوره 3 شماره
صفحات -
تاریخ انتشار 1997